A Sharp analog of Young's Inequality on S 
and Related Entropy Inequalities 



E. A. Carlen^ E. H. Lieb^ M. Loss^ 

1. School of Mathematics, Georgia Tech, Atlanta GA 30332 

2. Departments of Mathematics and Physics, Jadwin Hall, 
Princeton University, P.O. Box 708, Princeton NJ 08544 

Abstract We prove a sharp analog of Young's inequality on , and deduce from it 
certain sharp entropy inequalities. The proof turns on constructing a nonlinear heat flow 
that drives trial functions to optimizers in a monotonic manner. This strategy also works 
for the generalization of Young's inequality on to more than three functions, and leads 
to significant new information about the optimizers and the constants. 

Math reviews Classification Numbers: 43A15, 52A40, 82C40 

Key words: Inqualities, entropy, optimizers, best constants 



Work partially supported by U.S. National Science Foundation grant DMS 03-00349. 
Work partially supported by U.S. National Science Foundation grant PHY-0139984. 
(c)2004 by the authors. Reproduction of this article, in its entirety, by any means is permitted for 
non-commercial purposes. 



l//ei)rnori//2008; 18:41 



1 



1. Introduction 



This paper concerns further generahzations of the generahzed Young's inequahty due 
to Brascamp and Lieb [5], which we now recaU. 

For any > M, let ai, 02, . . . , oat be any set of N non zero vectors spanning . Let 
/i) /2, • • • , /jv be any set on N non negative measurable functions on IR. Given numbers 
Pj with 1 <pj < 00 for j = 1,2 . . . , N, form the vector 

P={1/Pi,1/P2,...,1/pn) , (1.1) 

and define 



11^ = 1 ll/jllP:,- 



Dip) ^ sup { -"^ ^jL-' , : fj^W{IR) j = l,2,...,iV ) . (1.2) 



A theorem in [5] reduces the computation of D{p) to a finite dimensional variational 
problem: Let Q denote the set of all centered Gaussian functions on IR. That is, g{x) e Q 

if and only if g{x) = ce~^^^^^ 1"^ for some s > and some constant c. Define Dg{p) by 



Dg{p) = svip{ ; ' — ■. g^eg j = i,2,...,N } . (1.3) 

It is proved in [5] that D{p) = Dg{p), and hence 

„ iV TV 

This can be used to explicit compute sharp constants in certain cases. For instance, when 
M — 2 and = 3, Dg{p) may be evaluated, and this gives the sharp constant in the 
classical Young's inequality for convolutions.* 

The first part of this paper concerns a version of this generalized Young's inequality for 
functions on the sphere S^~^. Our generalization was motivated by statistical mechanical 
considerations, and was devised to prove a sharp entropy inequality for probability denisi- 
ties on S^~^ which is also presented below. There are by now several alternative proofs 
of the Brascamp-Lieb inequality for functions on (e.g., [8], [2] and [3]), but none of 
these seem to be readily adaptable to the consideration of functions on 5'^"^, and it was 
necessary to develop a new approach. 



* The sharp constant in Young's inequahty for convolutions was obtained by Beckner at the same time 
that Brascamp and Lieb obtained their more general result. Beckner's results do not address the case of 
more than three functions, which is the focus here. 
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The new approach, it turns out, leads to a very simple proof of the original theorem 
in iR^, and enables us to strengthen the original theorem in several respects, clearing up 
some questions left open by the authors cited above. In particular, we resolve a conjecture 
of Barthe whose incisive work in [3] settled many of the questions about non negative 
optimizers for the variational problem (1.2). We also obtain new information on the 
constants. For any given choice of {ai, . . . , Bn}, we give an explicit formula for the (unique) 
choice of p that minimizes D{p), as well as the minimum value, which we refer to as the 
"best best constant" in the generalized Young's inequality. 

We shall proceed to these results along the path which led to them, and begin by 
recalling some facts that motivated the investigation of a spherical analog of (1.4). 

Let fiN denote the uniform Borel probability measure on S^~^{-\/N), the sphere of 
radius Vn in IR^ , and let Let 7jv denote the Gaussian probability measure 



d7^ = (27r)-^/2e-l^^l'/2d^^ . (1.5) 

We can consider /i^ as a probability measure on m^, supported on S^-^{y/N), and then 
it is a familiar fact that for large values of N, d^N ^ dfiN- In the considerations that 
motivated our investigation, a vector 

V = ivi,V2, . . .,vn) (1.6) 
in represents the velocities of N one dimensional particles. Under any sort of evolution 

N 

of the particle system that conserves kinetic energy, will be constant. Supposing 

that its initial value is A^, at any later time the state of the system will be given by a 
point in S^~^{\^). The uniform probability measure /Uat is called the microcanonical 
ensemble in statistical mechanics. The proability measure d7Ar on the other hand would 
be called the canonical ensemble for this system. The principle of equivalence of ensembles 
is a cornerstone of equilibrium statistical mechanics. For this simple system, it reduces to 
the statement that for any fixed positive integer k, and any bounded measurable function 
(f){vi,V2, ■ ■ ■ ,Vk) of the first k velocities only. 



lim / (f){vi,V2,...,Vk)diiN - / (/>(fi,f2, • • • ,ffe)d7jv = . 

However, the equivalence of ensembles goes only so far. A fundamental qualitative 
difference between 7jv and //jv is that under the first probability measure, the coordinate 
functions arc independent random variables, while under the second they are not. This 
lack of independence has an important quantitative effect that does not diminish with 
increasing N if we consider functions of all of the velocities at once, as we now explain. 

Before going further, it will be convenient to make a change of scale, and consider 
the unit sphere. The factors of VN that are necessary for comparison to the Gaussian 
measure d/iN will not be helpful in the next paragraphs. Therefore, let /i denote the 
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uniform Borel probability measure on ^, the unit sphere in IR^ . Let ttj be the jth 
coordinate function. That is, 

= e[-l,l]. (1.7) 

Consider functions fj defined on the interval [—1,1] and pull them back to the sphere via 
the coordinate function tTj, i.e., fj{'T^j{v)). We denote this function also by fj. It will be 
clear from the context which of these functions is meant. 

Because 'Yl!j=i^i{'^)'^ — 1; coordinate functions are not independent random vari- 
ables, and hence, given N functions fj on [—1, 1], the quantities 

need not be equal. In fact, simple examples show that it is possible for the integral on the 
left in (1.8) to diverge while all of the integrals on the right are finite. However, according 
to the following theorem, such a divergence is not possible if each fj is square integrable. 

Indeed, the product of the norms of the fj controls the integral of IljLi fj the 
strongest way that one could hope. In what follows, || • will denote an norm 

with respect to on S^~^. 

Theorem 1 For all N >2, given non-negative measurable functions fj, j = 1,2, . . . , N , 
on [-1,1], 

f , , f n/^(^^) 1 d/^< n 11/^11^^(5--^) • (1-9) 

Js'^ 1 \^j^^ J j^^ 

for all p > 2. Moreover, for each p < 2, there exist functions fj so that \\fj\\Lp{S'^-'^) < ^ 
for each j , while the integral on the left side of (1.9) diverges. Finally, for every p > 2 and 
N >3, there is equality in (1.9) if and only if each fj is constant. 

It is natural to refer to (1.9) as a spherical version of the generalized Young's inequality 
(1.4). The resemblance is accentuated if we write Vj = Cj -v where the Cj are the standard 

basis vectors in IR^ . The proof that we give for Theorem 1 can be adapted to prove 
a generalization in which other vectors other than the Cj are considered, but this is not 
needed for the application that we now describe. 

The inequality (1.9) implies a sharp entropy inequality for probability densities F on 
S^~^. Indeed, let F be any probability density on S^~^, and then, for each j = 1, 2, . . . , A'^, 
let fj denote the conditional expectation of F given Vj. This is a linear operation, and we 
define the operator Pj by 

PjF = fj . 

In more analytic terms, fj is the function on [—1, 1] so that for all bounded measurable 
functions on [—1, 1], 



SN- 
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As is evident from the definition, for square integrable F, fj = PjF is just the orthogonal 
projection in L'^(S^~^) of F onto the subspace consisting of functions depending only on 
Vj] i.e., measurable with respect to the sigma algebra generated by iVj. 

There is yet another relation worth bearing in mind. To explain, introduce the one 
dimensional marginal vn of fx: For N >3, and any function on [—1, 1], 



/ (p{vi)dii= / (t>{v)di'N 
Js^-^ J\-l,l] 



where 

I QN-2\ 

dm = ^-^^{i-v'r-'y'dv. (1.10) 

Here, denotes the surface area of the m — 1 dimensional unit sphere in M^; and 

\S^\ = 2. Then, fj{v)di'N is the marginal distribution of Vj under F{v)dfj,. Whenever we 
refer to the jth marginal fj of a probability density F on S^~^, we shall mean that fj is 
related to F in exactly this manner. 

Clearly, each of the fj is a probability density on S^~^. For any probability density F 
on S^~^ the entropy of F is S{F) defined by 

S{F) = I FlnFdfi , (1.11) 

and likewise, the entropy of the marginal is given by 

Sifj)^ fj^T^fjdm^ fjlnfjdii. 

J[-i,i] Js^-i 

How do the entropies of the marginals fj compare with the entropy of their parent density 
F? The following theorem provides an answer: 

Theorem 2 For all N > 2, given any probability density F on S^~^, let fj be the jth 
marginal of F for j = 1,2, . . . , N . Then 

N 

J2S{fj)<2S{F) , (1.12) 

and the constant 2 on the right side of (1.12) is the best possible. 

The inequality (1.12) may be compared to the familiar subadditivity of the entropy 
inequality on IR^ : Let G be any probability density on IR^ with respect to d77v, and let 
gj denote its jth. marginal, which is obtained by integrating out all of the variables except 
Vj. In this case, there is no relation among the coordinate functions. Hence 

N N 



I n si^iN = n ( / fj^^^ ) = ^ ' 
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so that H = Y[j=i 9j is another probabihty density on IR . Then by Jensen's inequahty, 




and there is equahty if and only ii G = H. Defining the entropy S{G) of a density G 
realtive to d7jv by S{G) — / GlnGd7jv, this says 

N 

Y,S{gi)<S{G) (1.13) 

with equality if and only if G — H. Note the difference between (1.13) and (1.12): The 
latter requires an extra factor of 2 on the right, independent of N. This is due to the 
dependence of the coordinate functions Vj resulting from the constraint ^^Li '^j ~ ^■ 

The difference between (1.13) and (1.12) is especially striking given the close relation 
between d/x and d7jv. The inequality in Theorem 2 does not depend on the radius of 
the sphere, since the uniform measure is normalized, and so Theorem 2 says that there 
is a dimension independent departure from the equivalence of ensembles as measured by 
subadditivity of the entropy. 

This dimension independence would not be guessed by linearizing the inequality in The- 
orem 2 about F = 1; it is a non-perturbative effect. The natural perturbative calculation 
would suggest that the difference between (1.13) and (1.12) "washes out" with increasing 
as we now explain. 

Consider a probability denisty F on S^~^ of the form 

F^l + eH 

where H is bounded and orthogonal to 1 in L'^{S^~^). Then fj = PjF = 1 + ehj where 
hj = PjH. Of course hj is also orthogonal to 1 in L'^{S^~^). 
A simple and frequently encountered computation gives us 

2 -^2 

S{F) = '-\\H\\l,^s--^^ + 0{e') and E "^(Z^) = y + ^(^') • 

Since 
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if we define the operator 



1 ^ 



J 1 



we have that 



S{F) 



= N 



{H, Piy)i2(5jv-i) 



L2(S^-i) 



+ 0{e) . 



An optimist might then hope that the supremum of S{PjF)/S{F) taken over all 

probability densities F would be given by Cat where 




The computation of the supremum is an eigenvalue problem, and has been done in [6; 
Theorems 1.2 and 2.1]. The result is 



The surplus over 1, namely 3/{N + 1), measures the "departure from independence" as 
a function of N. Thus, if one considers densities F that deviate only slightly from the 
uniform density, one gets a correction term to the constant 1 in the Gaussian entropy 
inequality (1-13) that "remembers" the dependence of the coordinates on the sphere, but 
which vanishes as AT ^ oo. 

The precise size of this "departure from independence" as a function of N is crucial in 
some problems of non-equilbrium statisitical mechanics. The computation of Cjv was at 
the heart of recent progess in computing the rate of relaxation to equilibrium in kinetic 
theory by direct consideration of an N body system, as proposed long ago by Mark Kac. 
For more details, see [6] and [7]. 

The fact that for more general densities F, the correction term to the Gaussian entropy 
inequality (1.13) does not vanish as N ^ oo complicates the estimation of rates of realx- 
ation in entropic terms for N body systems in kinetic theory. This said, we turn to the 
proof of Theorem 2. 

Proof of Theorem 2: Let F be any probability density on S^~^, N > 2, and let 

1 /2 

J = 1, 2, . . . , be its marginals. Then since fj is a probability density, = 1. 

As a consequence of Theorem 1, if we define C by 



Cjv = 1 + 



3 



iV + 1 
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we have C < 1. 

Suppose that C = 0. Then, YljLi fj — almost everywhere, and so YljLi ^^fj — 
almost everywhere. This would imply 



N 



3 = 1 

This is impossible, since by Jensen's inequality, S{fj) > for each Therefore, < C < 1, 
and we may define a probability density H on S^~^ through 

As above, we now apply Jensen's inequality to conclude that 

° ^ L. © (fl ■ 

The right and side is easily seen to be 

FlnFdii- I FlnHdii = 

r ^ r 

/ F In Fdfx - / Fin f^^d/^ + ln(C) = 

N 

I FlnFdf,-lj2 [ 1^ fJ^^' + • 



Since ln(C) < 1 unless each fj = 1, the inequahty is proved, with equahty holding only 
when each fj = 1. The fact that the constant cannot be less than 2 in the inequality 
follows by finding a trial function that we present in the Appendix. ■ 



As discussed above, the factor of 2 in Theorem 2 is a correction to the classical sub- 
additivity of the entropy that is rec^ired on account of the dependence of the coordinate 
functions due to the constraint X]j=i ^| ~ ^- ^1^^ remarkable fact that the size of this 
effect is independent of N depends on the specific nature of the constraint, and is not a 
general fact. 

For example, consider the planar constraint Vj = on , and let Pn-i de- 

note the hyperplane specified by this constraint. Let /i be a centered, isotropic Gaussian 
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probability measure on Pn-i- As we explain below, a special case of the Brascamp-Lieb 
Theorem yields the sharp inequalitiy 



N \ N 

Y[f{vj) A < n ll/illi^/(~-iHi'jv- 



(1.15) 



This is an analog of (1.9) for the planar constraint. Notice however, that this time the 
indices depend on N, and diminish towards 1 as N increases. 

Just as Theorem 2 follows from Theorem 1, one obtains an entropy subadditivity in- 
equality for the planar constraint from (1.15). Given a probability density F on Pjv-i 
with respect to the reference measure /i, define the marginal densities as above. Then the 
analog of (1.12) is the inequality 

E'5(/,)<^5(F). (1.16) 
i=i 

This time, since limjv^oo N/{N — 1) = 1, the effect of the contraint, as far as subaddititvity 
of the entropy is concerned, diminishes to zero as N tends to infinity. 

The connection between (1.15) and Young's inequality is revealing. To see the connec- 
tion, we change of variables. Let ej, j — 1,2, . . . , N he the standard basis vectors in IR^ . 
Let Uj be the normalized orthogonal projection of Cj onto the hyperplane Pn-i- One 
easily checks that for i ^ j, 

^ ^ 1 



and that for v in Pn-i, 



and 



- 1 

Vi = v-ei = \l^^v-Uj (1.17) 



For convenient constants, choose a scale so that the Gaussian denisty is M[v) — e"'^'"' 



Defining the single variable funtions gj{y) = fj y\j ^ ^ ^^^^ from 

(1.17) and (1.18) that 

N N 

l[fj(^j)M(^)-U9j(^j-^) ' (1-19) 
i=i 3=1 
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and if dC denotes Lebesgue measure on Pjv-i, 



|^n/^(^^)j d^=X |^n^^(^^-^)jd>c. (1.20) 

Furthermore, for each j, Jp^ ^ \fj{vj)\^^^^~^^djx = Jj^ \gj{y)\^/^^~^My. Therefore, iden- 
tifying Pn-1 with IR^~^ in the obvious way, (1-15) is equivalent to the inequahty 

which is a special case of the Brascamp-Lieb generalization of Young's inequality. 

In fact, for AT = 3, it follows from the sharp form of the classical Young's inequality for 
convolutions. To see this, let s = ui-v and t = us-v, and notice that since U1+U2 + U3 = 0, 
we have that —{s + t)=U2-v. A simple computation reveals that dC = (2/-\/3)dsdt, so 
that the N = 3 case of (1.21) becomes 

/ 9i{s)g2{-s - t)g3{t)dsdt < -7r\\9i\\L3/^{iR)\\92\\L3/2(^]R-)\\g3\\L3/^{iR) ■ 
This in turn is equivalent to the inequality 

\9\'^ 9l\\L^{lR) < -^I|5'i||i,3/2(JR)||5'2||l3/2(jr) , 

which is sharp. We now turn to the proof of Theorem 1. 

2. Proof of Young's inequality on S^~'^ 

We prove Theorem 1 using a non-linear heat semigroup. For 1 < i, j < N , i ^ j let 

d d 



The Laplacian on ^ is the operator 

A = Ei« = iEiL- (2-1) 

The normalization of the gradient on S^~'^ implicit in this is convenient; for smooth 
functions / and 9, we write 

V/ ■ = J] L,,,fL,,,g = i J] L,,,fL,,,9 , (2.2) 

i<j i^j 
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and |V/p = V/ • V/. 

Now fix any p > 1. For any smooth, non negative function g in Lp{S^~^), and any 
t > 0, define 

g{v,t) = {e'^gP{v)f' . (2.3) 

The first thing to observe is that g{-,t) will be smooth and strictly positive for all t > 0, 
and the LP{S^~^) norm of g is conserved under this evolution: 

\\9{-,t)\\Lp{S'^-^) = llfi'llLp(s^-i) (2-4) 

for all t > 0. 

The second thing to observe is that if g depends only on vj for some j, so does g{-,t). 
The reason is that g depends only on Vj if and only if g is invariant under all rotations that 
fix the jth coordinate axis, and these rotations commute with the Laplacian. We write 
g{vj,t) to denote the evolution of such a function. 

The third thing to observe is that the evolution, though non-linear, has the semigroup 
property: For all s,t > 0, 

g{v,s + t)^{e^^gP{v,t)f'' . (2.5) 
The fourth thing to observe is that 

lim g{v,t) = ||^||lp(5Jv-i) (2.6) 

uniformly in v. 

Finally, a simple computation shows that for any smooth, positive function g on S^~^, 



= -g'-^AgP ^Ag+ip-1) ^ . (2.7) 
t=o P 9 



Lemma 2.1 Consider any N non negative functions gi, g2, ■ ■ ■ , gN in L^([— 1, l],d^'Ar). 
Use p — 2 and gj in place of g in (2.3) to define gj{vj,t). Then by the smooting properties 
of the heat equation, the function 4){t) defined by 




is differentiable for all t > 0, and is right continuous at t = 0. Moreover, introducing the 
functions and G defined by 

N 

hj{vj,t) ^liigj{vj,t) k = l,2...,N and G = J]^ t/j , (2.8) 



l/february/2008; 18:41 



11 



dt 



4>{t) 



2 Js^-' i^k 



(2.9) 



Proof The statements about smoothness and continuity require no justification. Taking 
p = 2 and g = gk{vk,t) in (2.7), we have 

^ f +\ A f .^ , \^9k{vk,t)\'' 
-K-.9kKVk,t) = Agk{vk,t) + -— . 

ot gk{vk,t) 



Hence, supressing the arguments on the right. 



dt \Js^-^fJ[ / t=o t^Js^-^ V 



^9k H 

's^-i V 9k 



|2\ N 

n 9edii . 

= l,£^k 



The integral on the right can be written as 

AT N ^ N 



„ AT N „ AT ,| N 

/ J2(^9k) n / n ^^d/.. (2.10) 

clearly, the second integral on the right is non negative. We therefore examine the first 
integral. 

Observe that Lijgk = unless either i = k or j = k. Therefore, 

N N N f \ ^ 

L.E(^^^) n 9£d^.= I Y.[Y.^h3k+Y.^h9k] n ^^d^ (2-11) 

-'^ k=l £=l,£^k k=l \i<k j>k ) e=l,e^k 



Integrating by parts. 



/ _ XI ( XI ^Ikdk ) n Sidfi =- _ X ( X ^i,k9kLi,k9i I n Sedn . 

•^•S^ ^ fe=l \j<fe / £=i,£:^k "^^^ ^ fe=l \i<k / £=l,£^i,k 



Using the notations in (2.8), the integral on the right side of (2.12) is 



(2.12) 



N 

- / yZyZ(^hkhkLi^khi)Gdii 



Doing the same integration by parts on the remaining terms in (2.11), and substituting i 
for J, we have 

. N N 

I y2^^9k) TT 9idiJ,^- \2i^hkhkLi,khi)GdiJ, . (2.13) 
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With the same notations, we have 



N 

E 



\^9k\ 



e=i,i^k ■^'^^ ^ fc=i 

= / y^{u,khkfGdf, 



(2.14) 



SN-l 



ii^k 



Combining (2.13) and (2.14) we see that 



d 
dt 

This is (2.9). 



Proof of Theorem 1: By Lemma 2.1, the difference between the right and left hand 
sides of (1.9) is 



TV 

A+ I Yl93{vj,t)dfi I dt 
\ dt JsN-i ' 



2 Jo \Js^-^ J 

This proves the inequahty. 

Also, it is now clear that for all t > 0, each hk is smooth and bounded, and G is strictly 
positive, so that there is equality in (1.9) if and only if [{Li^khk) — {Li^khi)]^ = for all 
t > 0, all V and all i ^ k. 

Fixing t, i and k, this requires 



Vih'kivk) = -Vkh'iivi) 



This implies that for some constant c. 







-Vi 




= c 


Vk_ 



(2.15) 



for all values of Vi and Vk- 



Hence, for all i ^ k, h[ and /i^ are linear functions with slopes of the same magnitude but 
opposite signs. For N >3, the signs of all pairs cannot be opposite unless all of the slopes 
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are zero. This concludes the proof that there is equahty in (1.9) if and only if each of the 
functions there is constant.* 

In the appendix, there is an explicit example showing that (1.9) cannot hold if L'^{S^~^) 
is replaced by Lp{S^~^) for any p < 2. In fact, it is shown that for any p < 2, there is 
a function / so that with fj — f for all j, the left hand side of (1.9) is infinite, and the 
right hand side is finite. Alternatively, one can see that if (1.9) did hold with 2 repalced 
by some p < 2, then Theorem 2 would hold with 2 replaced by this value of p, which we 
have seen is not possible. ■ 

The simple heat fiow argument that was used to prove Theorem 1 can be adapted to 
other situations as well. Indeed, one could easily consider inequalities for integrals over 

p 

S^~^ of more general products TT fj{(ij ■ v). The case considered here was P = N and 



Sj — Cj because that was what was relevant for Theorem 2. Further generalizations are 
possible, and may be interesting. 

In the next sections, we exhibit the versatility of the method by showing that a heat 
flow interpolation between trial functions and Gaussians can be used to prove the original 
Brascamp Lieb inequality on IR^ . Barthe [2], [3] has given a proof of this inequality, 
together with a dual inverse inequality using an interpolation based on optimal mass 
transport. It was somewhat surprising that one could prove the Brascamp Lieb inequality 
with such a simple heat flow interpolation, and after hearing a report on our work, Barthe 
and Cordcro-Erausquin [4] added to the surprise by showing that a heat flow interpolation 
could be used to prove the inverse dual inequality as well. 

3. The generalized Young's inequality on 

We have introduced this inequality in the introduction, and shall use the same notation 
here. Recall that for any N > M, ai, a2, . . . , ajv is a set of N non zero vectors in . 
Let /i, /2, • • • , /at be any set of N non negative measurable functions on M, and consider 
the integral 



There are certain natural restrictions on the underlying set of vectors dj. First of all, 

{tti, a2, . . . , Sn} must span IR^ for the integral in (1.2) to possibly converge. Second, it 
is natural to assume that no pair of vectors Si and Sj are proportional; if they were, we 
could combine two factors into one in the integrand (1.2). These assumptions will be in 
force throughout the following sections. 

As before, given numbers pj with 1 < Pj < oo for j = 1,2 . . . , N , form the vector 
p — {1/pi, 1/P2, ■ ■ ■ , ^/pn), and define D{p) and Dg{p) through (1.2) and (1.3) respectively. 

* The analysis of (2.15) in a preprint of this paper contained an error. This was pointed out and corrected 
in a private communication from Shannon Starr, to whom we are grateful. 




(3.1) 
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The Brascamp and Lieb Theorem asserts that D{p) = Dg{p). As in the proof of Theo- 
rem 2, we shall use a non linear semigroup based on an appropriately chosen heat kernel 
to interpolate between arbitrary trial functions and Gaussian optimizers. The appropriate 
choice of the heat kernel depends on both p and the vectors {ai, . . . , OAr}. We shall show 
in this section that the desired heat kernel exist whenever the supremum is attained the 
Gaussian variational problem (1.3) for given p and {ai, . . . , a^}- Note that the supremum 

being attained means that there are numbers < Sj < oo so that with gj{y) = e~^*j^ 



„ iV N 

/ n 9j i^j ■ = Dg{p)Y[ \\gj lip,. 



In this case, we shall say that the Gaussian variational problem has an optimizer, and we 
identify the optimizer with the vector in IR^ whose jth entry is sj. 

Theorem 3.1 Let {ai, a2, . . . , a^v} be a set of vectors spanning and suppose that 

the vector p is such that the Gaussian variational problem (1.3) has a maximizer. Then 

D{p) = Dg{p). 

By itself, this theorem is contained in the Brascamp-Lieb Theorem, which asserts that 
D{'p) = Dg{p) in general. However, as we shall see in the next section. Theorem 3.1 
provides the essential reduction of (1.2) to (1.3), and to complete the analysis and prove 
the full result, one needs only certain facts about the Gaussian variational problem. For 
the most part, the facts we need are contained in the work of Barthe [3], so that once we 
have proved Theorem 3.1, our work is largely done. The rest of this section is devoted to 
the proof of Theorem 3.1. 

As preparation for the proof, let R be any invertible M x M matrix, and consider the 
heat semigroup e*^ generated by 

L = V • R^RV . (3.2) 
For each j, and each t > 0, define /, (•, t) by 

/,(t,a,-x) = (e*Vf (a,-x))'/^^ . (3.3) 

Since L commutes with translations, the set of functions on IR^ of the form /(a ■ x) is 
invariant under e*^ . In fact, for any bounded function /o on IR, for all t > 0, e''^fo{a-x) = 
f{t, a ■ x) where f{t, y) is the solution of 

-/(t,2/) - \Rd\^—f{t,y) f{0,y)^My) . (3.4) 

The fundamental solution of (3.4) is g{t,y) = ^ ^ e-yVims\^t)_ therefore, with 

^yA7^\Ra\^t 

pointwise convergence. 
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Let gj{y) denote the centered Gaussian function defined by the right hand side of (3.5). 
Note also that for each j and t, 

\\fj{t,-)\\p,^\\fj\\p,^\\9j\\p, ■ (3.6) 

If wc assume that each fj is bounded and has compact support, then it is possible to 
obtain simple Gaussian bounds on each fj{t,y) from which, using (3.5) and the obvious 
dominated convergence argument, it follows that 

„ AT r ^ 

lim / n t^/'^^^fAt, t^/^iaj ■ x))d^x = / n gjictj ■ x))d'^x . 

Moreover, by the scale invariance that obtains under (4.1), 
so that 

N N 



X . 



It now follows that if we can choose R so that / |T fj{t, {Sj ■ x))d^x is a non 



decreasing function of t, then 



/jRM Uj=ifji^-x)d^x ^ jj^M ]\j=ig3{a-x)d^x 

]\^=i\\fj\u ~ nf=i ibjilp,- 

By this argument, proof of the Brascamp-Lieb Theorem is reduced to finding a fixed 
matrix R so that the function 77 (t) defined by 

N 

Vit)^ / l[fj{t,aj-x)d''x (3.7) 

is non-decreasing where fj{t, y) is determined through the choice of R by (3.2) and (3.3). 

If this is to work at all, the Gaussian functions Qj defined by the limit in (3.5) must 
be maximizers for the variational problem (1.2), and certainly for the variational problem 
(1.3). We can gain insight into how R must be chosen by considering the Euler-Lagrange 
equation for (1.3). 
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For each j, let gj be the centered Gaussian function given by gj(y) = e (^j J'j)^/^. Then 
a simple calculation reveals that 

( -^^-n/y^;^:^;-"^^"" ) = -^Hsj) - in (det {AS^A^)) , (3.8) 

where S is the diagonal N x N matrix whose jth diagonal entry is sj, and A is the 
M X N matrix whose jth column is Sj] i.e., A = [ai, 02, . . . , aAr]. (This notation A = 
[ai, a2, . . . , ajv] will be used repeatedly in what follows.) 

Introduce the variables ti, • • • 5 by = In(s^). Let T be the diagonal matrix whose 
j diagonal entry is tj. 

Define the function on IR^ by 

0(ti,t2,...,tAr) = Tr (In (^e^ A*)) . (3.9) 

Since In (det {Ae'^A*)) = Tr (in (^e^^*)), we have from (1.3) and (3.8) that 

2ln{Dg{p))^ sup \^-tj-(P{ti,t2,...,tN)\ (3.10) 

tl,t2,---,tN I ^.^j^ Pj I 

A simple calculation shows that 

■^Hh, t2, • • • , tn) = e'^aj ■ (Ae'^A'^^aj 

(3.11) 

= {sjaj)-{AS'Ay\s,a,). 

Therefore, the Euler-Lagrangc equation for the optimization problem in (3.10) is 

1 



= SjariAS{ASy)-h,a 



3 



Pj (3.12) 
= ej ■ {ASY{AS{ASY)-\AS)ej , 

where Cj is the jth standard basis vector in IR^ . Notice that since rank(A) = M, the 
matrix {ASY {AS {ASY)~^ (AS) is just the orthogonal projection onto the range of AS. 

We now show that if the supremum in (3.10) is a attained, so that there is a positive 
diagonal matrix S satisfying (3.12), then we can choose 

R={AS{ASY)~^^^ , (3.13), 

and with this choice, the function r]{t) defined by (3.7) is non-decreasing. The key is the 
following lemma: 
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Lemma 3.2 Let /i, /2, • • • , /at be N bounded, non-negative measurable functions on IR 

with compact support. Let R be any invertible M x M matrix, and consider the heat 
semigroup e^^ generated by L = V • R^RV . For each j, and each t > 0, define fj{-,t) 
by (3.3) and define Define the function rj{t) by (3.7). Then with hj{y,t) — In fj{t,y) and 

t) — Y[^=i fji^i % ■ vi't) is differentiable for t > Q, and 




(3.14) 



where Q is the M x M matrix with 

Qi,j = ^i,jPj\R3ij\'^ ~ ■ R(^3 ■ (3.15) 



Proof: By (3.4) we have that 



d_ 
di 



f,{t, a, . X) = L/,(t, a, . X) + (p, - 



fj{t,aj-x) 



and hence 



N 



dt 



vit) = E 



Lfj{aj-x) + {pj-l)- 



RVfj{aj-x,t)\' 



fj{aj-x,t) 



l[ fi{ai-x,t)d'' . 



Let hj — Infj, and let F{x) = YljLi ffi^j " ^) • Then, integrating by parts in the term 
containing L, and suppressing arguments, 



at .ijfiM 



N 



Y,{Pj - ^)\Raj?\h'j\'' - ^{REi ■ Raj)h^ 



F{x)d^x 



N 



^Pj\Raj\'^\h'j\'^ - ^{REi ■ Raj)h'ih'j 
i=i 



(3.16) 



F{x)d^x 



Using the definition (3.15), we have (3.14). 



Proof of Theorem 3.1: We apply Lemma 3.2, and must choose R so that Q is a 
positive matrix. By assumption, there is a maximizer for the Brascamp-Lieb variational 
problem (1.3), or equivalently (3.10), and hence there is a positive diagonal matrix S 
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such that the Euler-Lagrange equation (3.12) is satisfied for each j. In this case with 
R = {AS{ASy)-^/^, (3.15) becomes 



Q ^ S-\I - P)S~'^ (3.17) 

where P = {ASy{AS(ASY)~'^(AS) is the orthogonal projection onto the range of AS. 
This is certainly non-negative, and hence whenever the Gaussian variational problem (1.3) 
has an optimizer, D{p) = Dg{p). ■ 



We close this section by expressing D{p) in terms of A and 5" when the supremum in 
(3.10) is attained. In this case, the optimizing Gaussians gj are given by the limit in (3.5). 
We may assume that = 1 for each j. With R given by (3.13), the Euler-Lagrange 

(pS^\^ - 2 

equation (3.12) says l-Ra^P = l/(s|pj), and hence gj{dj ■ x) ^ \ j e"*^*^"^'"") 
Thus, 

N N / 2\W] 

n*(%-)=n^ (3.18) 



i=i i=i 



and therefore. 



N 

\{{{Pis]Y'^^'"^)det{AS^A')-^'^ . 



X 



(3.19) 



i=i 

For future use, note that (3.18) can be written as 

Note that if S satisfies the Euler-Lagrange equation (3.12). so does XS for any A > 0. 
Replacing S by XS in (3.20), and taking A to infinity, we obtain D{p)5q in the limit, where 
5q is the point mass at the origin. This will be used later on. 

4. The Gaussian optimization problem 

The analysis in the previous section leads very naturally to the following questions: 
• For which values of p is Dg{p) finite? 
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• For which values of p is there an optimizer for the Gaussian variational problem (1.3)? 

These questions have been answered by Barthe [3] . (In the special case in which every 
set of M vectors chosen from among {ai, . . . , ajv} is a basis, this had been done in [5]). 
The answers may appear unfavorable for our program, since it turns out that in general 
there exist p for which Dg (p) is finite, but for which there is no optimizer for the Gaussian 
variational problem. Hence one additional observation is required to deduce the Brascamp 
Lieb Theorem from Theorem 3.1. 

First, we recall Barthe's answer to the first question, which is pleasingly simple: Let 
Ka denote the convex hull of the vectors z whose entries are either zero or one, and for 
which the set {Sj : = 1 } is a basis of . Barthe has proved [3] that Dg{p) finite if 
and only if p e Ka- 

Note that Ka lies in the hyperplane in given by the equation zj = M. Let 

K'^ denote the interior of Ka relative to this hyperplane. Barthe has also proved in [3] 
that when p G K'^, the supremum in the Gaussian optimization problem (1.3) is attained. 

In this section, we give another proof of these results. We do this for two reasons. First, 
our proof gives an alternate characterization of Ka that is directly checkable. Second, 
our proof is based on a partial scale invariance property of the functional that we seek 
to optimize. This partial scale invariance property of the functional is expressed in the 
identity (4.3) below. As we shall see, it completely determines the nature of Ka, and it 
provides a crucial handle on the variational problem in case p is on the boundary of Ka ■ 

The obvious scale invariance argument shows that for D{p) or even Dg[p) to be finite, 
it is necessary that 



and of course that 1 < Pj < oo for each j. Indeed, let A be any positive number, and replace 

each fj{y) in (1.2) by fj{\y). The numerator in (1.2) is proportional to A ^j=i , while 
the denominator is proportional to . This excludes a finite maximum unless (4.1) 
holds. 

A somewhat less obvious partial scaling argument leads to further restrictions on p. This 
depends on a simple identity that is crucial in what follows: 

Lemma 4.1 Let {ai, 02, ... , a^} he a set of vectors spanning IR^ . Let S be any proper, 
non-empty subset of {1,2,..., A^}, and let let r = dim(span({aj : j E S})) Then there 

are explicitly computable sets of vectors {bj : j E S} and {cj : j G S'^} such that for any 
set of non negative functions fj, each bounded and with compact support, 




(4.1) 






(4.2) 



■y + Cj- z)d 



M—r 



Z 



y ■ 



•3 
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Proof: Let {ui, . . . ,Ur} be an orthonormal basis for the span of {Sj : j E S}. Let 

{vr+iT ■ ■ ■ ,vm} be an orthonormal basis for the orthogonal complement. Choose the 
sign of vm so that det{[ui, . . .Ur,Vr+i, ■ ■ ■ ,vm]) = 1- Let U = [ui, . . .Ur], and let V = 

[vr+ii ■ ■ • tVn]- Define bj = U^aj and cj = V^aj. Likewise define y = t/*a; and z = V^x. 

Then aj ■ x — bj ■ y + cj ■ 5*, and for j E S , dj ■ x = bj ■ y. Since d^x = d'^yd^~'^z, and 
since, by construction, Cj = for j e S, (4.2) follows immediately. ■ 



To apply this, we rescale in span({aj : j G S}) alone: For each j G 5", replace fj by 

fj^^ where fj^\y) = X^^^^ fj{Xy). Then = so that this replacement does 

not affect the denominator in (1.2). Then: 



.tan 

(4.3) 



^ " 3€S j€S 



3€S 



I n • ^) / n -y^^r ^)d''-^^ d^y . 



We see that if 

— > r = dim(span({aj : j G S})) , 

jes 

then the integral in (4.3) diverges as A tends to +oo. Since ||/j'^''||pj = ||/j||pj i this means 
that D{p) is infinite in this case. These considerations justify the following definitions. * 

Definition Let {ai, a2, . . . , ajv} be a given set of vectors spanning IR^ . For each subset 
S of {l,2...,iV}, define 

r{S) = dim(span({aj- : j G S})) . (4.4) 

Let Ka denote the subset of consisiting of vectors i'such that "^jLi Zj = 1, < Zj < 1 
for each j, and finally 

J2z,<r{S). (4.5) 

jes 

Define K'^ to be the subset of Ka consisting of vectors z satisfying 

J2^j<riS) (4.6) 



* Note that as A tends to zero, J^^ggc fj (^^^j ' JZ + Cj ■ z) tends to zero, and will even vanish idenitically 
for A large enough when the fj have compact support. Hence, (4.3) does not give us information on the 
relation between ^^^^ and r{S) in the limit as A tends to zero. 
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for all proper, non-empty subsets S of {1, 2 ... , N}. For later use, we say that a subset S 
is critical at z if X^^gg zj = r{S) and subcritical at z if X^^gg Zj < r{S). 

It may seem that we are being inconsistent in our notation, as we have already used 
Ka to denote a certain convex hull in our description of Barthe's results. We shall show 
below that in fact the two sets coincide. For present puroses, this is not important, and 
the definition of Ka shall be the one made just above. 

What we have just seen shows that p G Ka is a necessary condidtion for < oo, or 
even Dg{p) < oo. It turns out that it is also sufficient. 

Theorem 4.2 (Barthe) Let {ai, a2, . . . , Oiv} be any a spanning set of vectors in IR^ . 
Then Dg{p) < oo if and only if p & Ka- Moreover, if z & K\, then the supremum is 
attained in the Gaussian variational problem (1.3). 

Barthe's proof is based on the convex hull description of Ka, as mentioned above. At 
the end of this section we give an alternate proof, and show directly that Barthe's convex 
hull definition of Ka yields the same set as docs our definition. First, we deduce the 
Brascamp-Lieb Theorem from Theorems 3.1, 4.2 and Lemma 4.1. 

Theorem 4.3 (Brascamp— Lieb) Let {ai, a2, . . . , ajv} be any a spanning set of vectors 
in m^. Then for all p, Dg{p) = D{p). 

Proof: If p e K\, everything is clear. By Theorem 4.2, the Gaussian problem has 

optimizers, and then by Theorem 3.1, Dg{p) = D{p). 

Therefore, suppose that p E Ka, but not K"^. Then there exists a non-empty proper 
subset S of the indices that is critical; i.e, such that ^j^^ 1/Pj = ''^{S). We further take 
S to have the least cardinality among all such sets. 

To apply the identity (4.2), consider 

Ds = sup ( ^'^^ ^' fi ' ^'"^^ : /, g L-^ (M) j = l,2,...,N} (4.7) 
[ lljes ll/i Upl- 
and 

g,. = sup l ^'"-'"^'^^'^';;- : /,eL«(iR) i = 1.2.....JV } . (4.8) 

Here, as in (4.2), r = r{S). Clearly, (4.2) yields the bound D{p) < USDs':- 

To obtain the opposite inequality, note that the the scaling identity X]j6s(-'-/^'j) ~ 

is satisfied, and by the choice of a critical set of minimal cardinality, there are no critical 

subsets of S for the variational problem of computing Ds- 

Therefore, there is a solution of the Euler-Lagrange equation (3.12) for (4.7), and hence 

it has Gaussian maximizers. Prom (3.20) we see that we can take these maximizers gj so 

that rijgs fj0j ■ y) is an arbitrarily good approximation of Ds times a Dirac mass at the 
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origin. This will eliminate the terms involving y in the second integral in (4.2). Hence for 
any e > 0, one can choose the functions fj,jeS, to be Gaussian and have 

n^ill/illp,- \ Ujes-WfjWpj 

We are now reduced to proving that the variational problem for Dsc has Gaussian 
maximizers. If there are no critical subsets of S"^ for this problem, we are done by Theorem 
4.1. Otherwise, "peel off" another critical subset. This procedure clearly reduces the 
cardinality of each time, and hence it terminates with a full set of Gaussian trial 
functions that come arbitrarily close to the supremum. This yields the identity D{p) = 
DsDsc and completes the proof of Theorem 4.3. ■ 

The remainder of this section is devoted to the proof of Theorem 4.2. We first show 
that the two definitions of Ka do indeed define the same set. 

Theorem 4.4 For any spanning set of vectors {ai, a2, . . . , Sn}, Ka is a the convex hull 
of the vectors z whose entries are either zero or one, and for which the set {aj : Zj = 1 } 

is a basis of . 

First we prove a lemma. 

Lemma 4.5 Consider any spanning set of vectors {ai, 3,2, ■ ■ ■ , Sn}, and any z in Ka- Let 
T he any non-empty subset of the indices {1, 2, . . . , A^}. // there is any set S of indices 
containing T that is critical at z, then there is a least such set So-' That is, there is a set 
So containing T that is critical at z, such that if S is any other set containing T that is 
critical at z, then So (Z S. . 

Proof: Without loss of generality, we may suppose that there is a set of indices containing 
T that is critical at z. Let S be such a set of least cardinality, and let S be any other set 
of indices containing T that is critical at z. Let V be the span of {Sj : j e S}, and let 

W be the span of {ctj : j E S}. Clearly, 

{aj : j eS}n{aj : j e S} cV nw 

and 

{aj : j eS}U{aj : j E S} CVUW . 
From the relation dim(V n W^) + dim(l/ UW) — dim(V) + dim (14^) it then follows that 

r{S n S) + r{S U S) < r{S) + r{S) . (4.9) 
Since both S and S are critical at z, r{S) + r{S) = JZjeS^j J2jeS^j- Since z E Ka, 
Ejesus ^ ^(-^ U ^) thus from (4.9) 

J2 ^j + r{SnS)<J2^j + J2^j ■ 
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This imphes that r{S H S) < Zj and since T is non-empty and is a subset of both S 

jesns 

and S, S nS is not empty. Hence S OS is critical at 2;. If 5' n 5" were a proper subset of S, 
then we would have found a critical subset of strictly smaller cardinality, contrary to the 
assumption on S. Hence S = S H S, and so S G S. The set S is the set Sq that we seek. 



Proof of Theorem 4.4: Suppose that z e Ka, and for some k, < Zk < 1. We shall 
show that in this case, z is not extreme. 

First, consider the case in which no critical set contains any indices j for which < 

Zj < 1. 

Since ^jLi = which is an integer, it must be the case that for some I ^ 
Q < zi < 1. Since neither k nor £ belongs to any critical set, increase (resp. decrease) 
Zk a little, while decreasing (resp. increasing) Z(, a little in such a way that Zk + z^ is 
constant, and the increases do not produce any supercritical sets. Clearly in this case, z 
is not extreme. 

Second, if there are critical sets containing indices j for which < Zj < 1, choose one, 
S, of least cardinality. Since S is critical, ^^^g is an integer, and there must be two 
indices k and £ in S such that < Zk, zg < 1. By the lemma and the definition of S, S is 
the smallest critical set containing either k or £. 

Clearly, we can increase Zk a little bit, and decrease zg a little bit without affecting 
Zk + Z£, and hence without affecting YljeS^j- Moreover, the increase in Zk does not 

increase the value of Ylj^s ^3 ^'-'^ other critical set S that contains k. This is because 

5' C ^ by Lemma 4.5 and the definition of and hence S also contains £. ■ 

Proof of Theorem 4.2: Let (/>a(^15^2, • • • ,^iv) = In (det(Ae^A*)) . The function was 
shown to be convex on IR^ by Brascamp and Lieb. Let (f)\ denote its Legendre transform: 

^*a{zi,Z2.,...,Zn)= sup l^Zjtj-^A{tl-,t2,...,tN)\ (4.10) 

tl,t2,---,tN I I 

By (3.10), determining the set of vectors p for which Dg{p) < oo is the same as determining 
the set of vectors z for which ([)*a{z) < oo. 

Next, recall a formula of Brascamp and Lieb, which can be deduced from the Cauchy- 
Binet formula: 

det(^e^^*)= Yl tsdet{AsA's) , (4.11) 

\S\=M 

where ts = exp (^j^s^ij ■ Here, we use the following notation: If S' = {ji,j2, ■ • • ,jk}j 
As denotes the M x k matrix 

As = [aj^,aj^,...,aj^] (4.12) 
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As shown in [5], the convexity of 0^1 (t) follows by differentiating (j){t + rv) twice with 
respect to r using the Schwarz inequality. Here v is an arbitrary fixed vector. 

Having made these remarks, we first show that = oo unless Yl!j=i — ^i — 

Zj < 1 for each j. 

For any constant c and any t in iR^, let tc denote the vector in IR^ whose jth component 
is tj + c. Prom the definition (3.9), it follows that 0yi(tc) = + Therefore, 



N 



tc - Mtc) = ^ 2;^ - M c + f- 



(4.13) 



N 

SO that the domain of 0^ lies in the hyperplane ^ zj = M. Further, it follows from (3.11) 

d 

that < — — (pAit) < 1 since this quantity is is the jth diagonal entry of an orthogonal 

dtj 

projection. Hence, every z in Ka is such that < Zj < 1 for each j. 

Recall the terminology that a subset S is critical at z ii J2jeS ~ '^{^) suhcritical 
at zii Y^jes^j < ^(^)- 

We now show that (f)\{z) < oo if f e K^. First, note that if J2f=i = then (4.13) 
reduces to 

z-tc- (l>A {tc) = Z-t-(t>A{^ . (4. 14) 

Hence in (4.10), we may restrict our focus to vectors t satisfying min^^^ = 0. 

For any t = (ti, . . . , t7v)i let t* = (t^, . . . ,t^) be its decreasing rearrangement. By the 
invariance noted above, we may assume that = 0. Let tt be any permutation so that 

t* = t^^j) for aU l<j<N. 



Let S be the indices of the pivotal columns in tt{A) — [0,^(1), 07r(2), . . . , a7r(A'")]. That 
is, the columns of Ag are the columns in n{A) that arc not in the span of the columns to 
their left in 7v{A). Since the dimension of the space spanned by the vectors ai, . . . ajv is M 
we have that \S\ — M. By monotonicty of the logarithm and (4.11), 



</>A(ti,t2,---,tAr) =ln(det(Ae^A*)) > J^tj +ln(det(AgA|) , (4.15) 



and hence it suffices to find a lower bound on 



N 
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Setting a/c = 1 if TT ^(k) & S, and = otherwise and bk = -Z7r(fc)) (4-16) can be written 

as 

AT 

^{ak-bk)tl . 

k=l 

The point about this notation is that the vector (ai, . . . , gm, 0, . . . , 0) which has N elements 
strictly majorizes the vector (bi, . . . 6jv), i-c, 

k k 

5^afc>5^6fe,/c = l,...,iV-l , (4.17) 

i=i i=i 

and 

N N 

J]afc = 5^6fc = M. (4.18) 

i=i j=i 

The equation (4.18) follows from the definition of the a^s and b^s, the fact that \S\ = M 

and the fact that '^jLi = The equation (4.17) follows from the definition of the 
ajt's and bkS and the fact that z e K^, i.e., for every proper subset S of {1,...,A'"}, 

Summing by parts, using = and Ylk=i = Yjk=i ^k, 

N N-1 I k 

J2i^k - bk)tl = $^(a, - b,) ) {tl - tU,) > ctl 

k=l \j=l 

where c = mini<fe<jv-i Z)^=i(%- - bj) > 0. 
Hence 

N 

5^ ^fe - Zktk > cmax(tfe) , 

k£S k=l 

which, together with the bound (4.15), yields 

z-t-(j)A(^ < -ln(det(^c^*^) - cniax(tfe) . 

k 

Therefore, as any of the variables ti, . . . , ^at-i tend to infinity (recall that without loss ^tv 
can be chosen to be zero), z ■ t — (/)Aiti, . . . ,tN) tends to — oo, and so (t>\{z) < oo. By the 
convexity of (pA, proved by Brascamp and Lieb, the supremum in (4.10) is attained in this 
case. 

It remains to show that 0^(i) < oo for all z in Ka- This is an easy consequence of 
Theorem 4.4. Suppose that p is one of the verticies of Ka- If Pj = oo, we may as well 
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replace fj by 1 in (1.2). Therefore, there are effectively only M vectors and functions. 
Letting S denote the set of indices for which pj = 1, we have the identity 




(4.19) 



which gives us D{p) = (det(^5^5))^/^. This is finite, and since D{p) is convex and finite 
at the vertices of Ka, it is finite throughout Ka- ■ 



5: Determination of the optimizers 

A partial solution to the problem of determining all maximizcrs, when they exist, for 
(1.2) was obtained in [5] where it is proved that in the case M = 2 and = 3, which gives 
the classical Young's inequality, the only non negative maximizers of the ratio in (1.2) are 
certain specific Gaussian functions. The method of proof extends to more general cases 
involving M + 1 functions in , but not to values of A" > M + 1. 

Under the additional assumption that there exists an optimizer to the Gaussian varia- 
tional problem (1.3), a full determination of the non negative optimizers was obtained by 
Barthe [3] . He conjectures that when there is no optimizer to the Gaussian problem, there 
is no optimizer for the general problem (1.2). Here we give a proof of Barthe's theorem, 
and of his conjecture. We also determine the form of all of the complex valued optimizers. 

Before we begin, note a restriction that we may impose on {ai, 02, ... , Sn} without loss 
of generality: We may assume that if any one vector is deleted from {di, 0,2, ■ ■ ■ , ajv}, then 
what remains still spans IR^ . The point is that when ai is necessary for the whole set to 
span, there is a change of coordinates under which 




for some vectors 62, . . . , ^at in iR ~ . (The calculation in (5.1) is carried out at the end 
of the Appendix.) This reduces the analysis of (3.1) to an integral of the same type, but 
with one factor and one dimension fewer. It also shows that in this case, we must have 
pi = 1 to obtain a finite constant D. Also it is clear in this case that the optimizers need 
not be Gaussian, since /i can be any integrable function without affecting the value of the 
ratio in (1.2). 

We therefore make the following definition: 

Definition Given a spanning set S = {ai, 02, ... , a^} of vectors in IR^ , we say that aj 

is essential in case S\{aj} does not span m^, and we say that S is properly redundant in 
case no vector in S is essential, and moreover, no two vectors in S are proportional. 
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We can always apply the reduction argument given just above to eliminate any essential 
vectors. Notice that ii N = M, every vector is essential, and in fact, we have the identity 



„ AT M 

For this reason, we are interested mainly in > M. 

We also see right away from (5.2) that if p is a vertex of Ka, so that M of the indices 
are 1, and N — M are oo, then we get maximizers in (1.2) if and only if we take each of 
the L°° functions to be constant, and there is no restriction on the functions. Hence 
for p a vertex of Ka, the maximizers are far from unique, and need not be Gaussian. 

Lemma 5.1 Let {ai, a2, . . . , Sn} span IR^ . Let A — [oi, 02, . . . , cin] be the M x N matrix 
whose j-th column is Sj. Let P be the orthogonal projection in IR^ onto the image of A*. 
Then cij is essential if and only if Pjj = 1. 

Proof: By definition, aj is essential if and only if there do not exist numbers ui,U2, ■ ■ ■ , un 
such that 

N 

Uj 7^ and Uk^k = . (5-3) 

k=l 

N 

Let u be the vector in IR^ whose kth entry is Uk- Then ''^u^ak = is exactly the 

fc=i 

condition for u to belong to the kernel of A. Hence, dj is essential if and only if Uj = for 

each vector u in the kernel of A. Let ej denote the jth standard basis vector in IR^ , so 
that Uj = Bj ■ u. Then, since the image of ^* is the orthogonal complement of the kernel 
of A, we have that Sj is essential if and only if Cj belongs to the image of Clearly, this 
is the case if and only if Pjj = 1. ■ 



/ f,{y)dy. (5.2) 



Theorem 5.2 (Barthe) Let {di, 02, ... , ajv} be any properly redundant spanning set, and 
let p e Ka- Then the solution S of the Euler-Lagrange equations (3.12) is unique up to a 
constant multiple. Moreover, non-negative functions /i , . . . , /at satisfy 



N 

n fjiaj ■ x)d''x = D{p) n II/.IIp, (5.4) 



if and only if there is a number c > and a vector b e Img(yl*) so that for each j, fj{y) 
is a multiple of 

exp(-|(.?(y-6,)^)) . (5.5) 
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Proof: Recall the proof of Theorem 3.1. Fix any t > 0, and let hj{y) denote ln{fj{t, y)). 
Note that each y) is smooth and strictly positive, so that each hj is smooth. Then by 
(3.16) we must have 



X] 



1;3 



F{x)d^x = , 



(5.6) 



where F{x) = Uf^i fji^j ■ x) and Q is given by (3.17): Q = S-^{I - P)S-^ where P is 
the orthogonal projection onto the image of {ASy = SA^. 

Step 1: (Each hj is a quadratic polynomial) Since F is strictly positive, it follows from 
(5.6) that the vector 

h']^{ai ■ x) 
/i^(a2 • x) 



v{x) 



.h'j^{aN ■ x). 



(5.7) 



is such that S~^v{x) lies in Img(5'^*) for every x. This means that if u is any vector in 
the kernel of AS^ then u ■ S~^v{x) = for all x. 

For any vector u in the kernel of AS, we define the function 4){x) by 



N 



(f){x) = u - S ^v{x) — ^^UjSj ^h'j{aj ■ x) . 



AT 



Since (f) vanishes identically, = V4>{x) — ^^'U'jSj ^h'-{dj ■ x)dj. This means that for each 
X, the vector w{x) defined by 



w{x) = 



Si ^h'l{ai ■ x)ui 

5^^/2.2(02 • X)U2 



.Sj^^h%{aN ■ x)un 



(5.8) 



lies in the kernel of A. 

We shall first show that h'l{di-x) is constant. To do this, write ai as a linear combination 

of the other vectors a^ : ai = ^^^^2 ^j^j- This is possible since ai is not essential. There 
may be many ways of doing this, but we can always choose one such that a minimal 
number of the a's are non zero, which we do. Suppose that there are exactly k values of 
h jij2, ■■■Jk for which aj ^ 
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The vector u = S ^ 



1 



belongs to the kernel of AS. We use this vector u in (5.8) 



to define ^^{x). 

Now let y be any vector in IR^ that is orthogonal to a^^ , but not orthogonal to ai. Since 
w{x) lies in the kernel of A for every x, so does the vector we get when we differentiate 
each component in the y direction. That is, for each x, 

{y- ai)s^^h'l'{ai ■ x)ui 
{y- 02)53 ^^2" («2 ■ x)u2 

{y- d2)sJ^^h%{aN ■ x)un- 



lies in the kernel of A. The jfeth component of this vector vanishes identically since 
y ■ (ij^ — 0. The N — k other entries for which Uj = also vanish identically. This means 
that for each x, the above vector lies in the kernel of A, and has no more than k — 1 non 
zero entries. By assumption there is no vector in the kernel of A whose first component 
is non zero and that has fewer than k non zero entries. Hence the first component must 
be zero. Since y ■ ai ^ 0, and ui ^ 0, this means h'l'{ai • a;) = 0, and proves that h'l is 
constant. 

The argument may now be repeated for each j, and we learn at this point that each hj 
is a quadratic function. 

Step 2: (Determination of h'-) Let Cj denote the value of h'-. Then, from (5.8), for any 

vector u in the kernel of AS, the vector whose jth entry is sJ^CjUj belongs to Kcr(A). 

Since u is in the kernel of AS if and only if Su is in the kernel of A, we see that sJ^Cj 
must be a constant multiple of Sj. In other words, for some constant c, we have 

Si'^h'l{a\ ■ x) = 8^^/2.2(02 ■ x) = ■ ■ ■ = sJ^h%{aN ■ x) = —c (5-9) 

This of course means that for each j, there are constants aj and bj so that 

hjiv) = -^^Uv - + ' (5-10) 

which means that 

fM = exp {y - bjf + aj) . (5.11) 

Evidently, c > 0. 

Step 3: (Determination of hj) Let h denote the vector h in IR^ whose jth component is 

hj. Prom (5.10) and the definition of v{x)., we see that v(0) = -cSH. We have seen that 
S~^v{Q) lies in Img(5'^*), and so 6 lies in Img(^*). 
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The constant bj is the mean of the probabihty density fj^{y)/\\fj\\pj, and the mean 
does not change under the evolution considered here, which commutes with translations. 
Therefore, we see that fj{t, y) can have the form specified in (5.11) if and only if it has this 
form at t = 0. That is, there is equality in (5.4) if and only if there is a positive constant 

c, and a vector b in the image of A* so that each fj has the form specified in (5.11) with 

bj being the jth component of b, and the aj are arbitrary. ■ 

Corrolary 5.3 Let {ai, a2, . . . , oat} be any properly redundant spanning set. Then the 
function 0(ti,t25 • • • ,tN) = Tr (in (^Ae^A*)) is strictly convex, except along the lines ob- 
tained by adding a number c to each tj . 

Proof: Were this not the case, we would have two solutions S of the Euler-Lagrange that 
would not be constant multiples of one another. ■ 

The strict convexity was proved by Brascamp and Lieb under the stronger hypothesis 
that every subset of M vectors chosen from {ai, 0,2, ■ ■ ■ , ajv} is linearly independent. 

Concerning maximizers for p on the boundary of Ky^, we have already dealt with the 
vertices - these have plenty of non-Gaussian maximizers, and in the strict sense considered 
here do not have any Gaussian optimizers: If pj — 1, then fj may be any non negative 
function, and and so may be taken to be Gaussian, while if pj = 00, then fj must 
be constant, and therefore not Gaussian. One could consider constants as degenerate 
Gaussians, though this would not be entirely consistent with the terminology we have been 
using in reference to the Gaussian optimization problem. Alternately, one can stipulate 
that Pj < 00 for all j. Indeed, if pj = 00, then the corresponding factors involving fj can 
be deleted top and bottom in (1.2) without affecting the constant. We may then prove a 
conjecture of Barthe [3] : 

Theorem 5.4 Let {ai, 02, ... , cin} be any properly redundant spanning set, and let p e 
Ka\K°^ be such that pj < 00 for all j. Then there may be no optimizers for the Brascamp 
Lieb inequality, but when there are optimizers, there are Gaussian optimizers. Moreover, 
there is a constructive procedure for deciding whether or not optimizers exist in any par- 
ticular case. 

Proof: We again apply the factorization formula (4.2) from Lemma 4.1. As in the proof of 
Theorem 4.3, let 5 be a critical set of least cardinality. Such a set exists since p e Ka\K'^. 
As shown in the proof of Theorem 4.3, D{p) — DsDs<= where Ds and Dgc are defined in 
(4.7) and (4.8) respectively. Since S was a critical subset of least cardinality, there are no 
critical subsets for this problem. Hence there are Gaussian optimizers for the variational 
problem that determined Ds- 

Next, suppose that there are no critical sets in the variational problem that determines 
Dsc Then this problem has only Gaussian optimizers, unique up to a common scaling 
and certain translations. However, examining (4.2) we see that for 



yWy / n ■ y + ■ M""'"^ (5-12) 
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to equal DsDsoYlj^i WfWpj, it is necessary that the translations in the integral on the 
right be among those permitted by Theorem 5.2. There is a simple criterion for this: Let 
Age be the matrix obtained by deleting from A the jth coUumn whenever j G S. Let U and 
V be the partial isometrics used in the proof of Lemma 4.1, so that if we put B — A^gcU 
and C = ^gcF, the columns of B (resp. C) are the vectors bj (resp. Cj) in the second 
integral. 

When optimizers exist, it must be the case that for each y the translation in the sec- 
ond integral is one permitted by Theorem 5.2. Clearly this is the case if and only if 
Img(i?) C Img(C). Conversely, if this is the case, all of the translation are admissi- 
ble, and using Gaussian optimizers for Ds^ in (5.12), we will have this integral equal to 

DsDs^ nf=i ll/IU- 

The general case is handled in very much the same way: If there are critical sets in 
the variational problem that determines -Dgc, "peel these off" repeatedly until one gets 
a problem with no critical subsets, and hence Gaussian optimizers. Now one works ones 
way back up, checking the compatibility condition Img(i?) C Img(C) each step of the way. 
If this is ever violated, there are no optimizers. Otherwise, we obtain a set of Gaussian 
optimizers. ■ 

One might further hope that the Gaussian functions in Theorem 5.2 are also the only 
optimizers of Young's inequality in the wider class of complex valued functions. However, 
this is not the case. The reason is that there exist in general functions (y) with 

e'^r=i'^^-("^-") = l , (5.13) 

and thus if /i, . . . , /jv is any set of non-negative optimizers, then e'^^/i, . . . , e'^^/jv is a 
set of complex optimizers. Here are some examples. Any three vectors in IR^ are linearly 
dependent, i.e., there is a relation Yl^=i '^j^j — 0- hence with 0j(j/) = ajy (5.13) holds. 

With four vectors there are more possibilities. E.g., pick cti = ei, 02 = 6*2, 03 = (ei -1-6*2 )/-\/2 
and a4 = (ci - e2)/\/2. then the function (piiy) = 4>2{y) = -y'^, 4>i{y) = 4>A{y) = again 
satsify (5.13). Hence there are non-trivial complex valued optimizers. * 

In general, let {/i, /2, . . . , /jv} be any set of optimizers. Define functions Zj by 

zAv) = ay)mv)\ 

where fj{y) 7^ 0, and Zj{y) = 1 otherwise. These functions take values in the unit circle 
in the complex plane. In order to have equality in the generalized Young inequality, it is 
necessary that 

N 

l[zjiaj-x) = l (5.14) 

i=i 

* The possibility of complex optimizers of this type for Young's inequality was pointed out to Brascamp 
and Lieb by J. Fournier; see a note added in proof at the end of their paper. 
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almost everywhere. 

Theorem 5.5 Let {ai, . . . jSn} be any set of vectors spanning such that no two 

vectors are multiples of one another. Let zj, j = 1, . . . , N , be any j measurable functions 
from IR to the unit circle in the complex plane such that (5.14) holds almost everywhere. 
Then for each j, 

where is a polynomial of degree at most N — M. 

We first prove a lemma: 

Lemma 5.6 Let z be a function from M to the unit circle in the complex plane, and let n 
be any positive integer. Suppose that z has the following property: 

z{x) 

where i(^{x,y) is a polynomial of degree n — 1 in x with coefficients that a measurable 
functions of y. Then z(x) = e*"^^^^ where (j) is a polynomial of degree n. 

Proof: Before beginning, notice that the modulus of z is constant, and non zero. Hence 

z is never zero. 

Consider first the case n = 1. Writing w{y) = e^^'^\ since there is no x dependence in 
this case, 

z{x + y) = z{x)w{y) . 
Now let p be any smooth compactly supported function on IR. Then 

J z{x + y)p{y)dy = z{x) J w{y)p{y)dy . 

Since the restriction of w to any interval is a non-zero function in L^ on that interval, and 
since smooth, compactly supported functions are dense in this L^ space, we can choose p 
so that J w{y)p{y)dy — c ^ 0. Then we have 




z{x + y)p{y)dy . 



This shows that z is smooth. In particular, once we chose a branch of the logarithm for 
^(0), there is just one way to choose the logarithm of z{x) so that it is continuous, and 
then of course it is smooth. Hence there is a smooth real function so that z{x) = e'^'^^^\ 
and 

0(a; + y) = 4>{x) + V'(y) • 

Evidently, ij) is also smooth. Applying d'^ jdxdy to both sides, we learn that 0" vanishes 
identically, and so is a polynomial of first degree. 
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Now suppose that n > 2. Here the argument is similar, but requires one more step. We 
first write 

z{x + y) = 2(x)e^^(^'2') . 

Pick any xq, and choose a smooth, compactly supported function p as above so that for 
this Xq, 



j e^^(^0'2')p(y)dy 



Now, no matter how large the coefficients of the polynomial ip{x,y) may be at certain 
y in the support of p, the function 



is continuous in x by the Dominated Convergence Theorem. 

We conclude that c{x) = J e*^*^^°'^V(j/)<ij/ is continuous and non-zero on a neighborhood 
of Xq. Hence 



c{x) J 



is continuous on a neighborhood of xq. Since xq is arbitrary, z is continuous. 

It now follows that e^'^^^'^'' is continuous in both x and y. Therefore, the coefficients 
are uniformly bounded functions of y in any compact interval. This means that all of 
the partial derivatives in x of e*^*-^'^-* are integrable and continuous, and so the function 
c{x) = J e'^^^'y^p{y)dy that we defined above is not only continuous, it is actually smooth 
in X. It now follows that z is smooth, and as before we write z — e^'^, and have 

(f){x + y) = (f){x) + ipix, y) . 

Taking n derivatives in x, and using the hypothesis that ^/'(x, y) has degree n — 1 in x, we 
see that the nth derivative of is constant. Hence is a polynomial of degree n. ■ 



It is of course well known that if (j) and ip are two measurable functions on IR such that 

(f){x + y) = (f){x) + ipiy) 

then both (p and if) are first degree polynomials. Lemma 5.6 generalizes this in several 
respects. It seems likely that it may be known, but we cannot find any reference for it. 

Proof of Theorem 5.5: We can easily eliminate any essential vectors from {ai, . . . , Sn}'- 
If ttj is essential, it is clear that zj is constant. Hence we may assume that {ai, . . . , ajv} is 
properly spanning. 

It suffices by symmetry to show that zi has the specified form. Choose a basis for 
IR^ from {ai,...,aiv} that contains ai. After renumbering, we may assume this is 
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{ai, . . . , Sm}- Let bi be unit vector that is orthogonal to the span of {02, . . . , Sm}, and 

— * 

scaled so that 61 • ai = 1. 

Now, for any yi in iR, translate the identity (5.14) by replacing x with x + yibi. Since bi 
is orthogonal to Sj for 2 < j < M, the corresponding factors are unaffected by translation, 
and hence 

N N 

zi{{ai- x) + yi){ Yl Zj{aj ■ {x + yibi)) = zi{ai-x) JJ Zj{aj ■ x) . 

j=M+l j=M+l 

Let Ty be the operator 

^y[z)[x) ^^^^ . 

Then defining 

zi'\t-yi)^^{t), 

Zl 

and defining Wj = Zj{aj • {x + yibi))/ Zj{aj ■ x) ior j > M + 1, 

N 

z[^\di ■ x; yi) Y\_ ^ji^j-^)^^- (5.15) 

j=M+l 

This is of the same form as (5.14), but with fewer functions. 

Next, choose 62 so that 62 • ^m+i = (if it wasn't the case already that bi • Sm+i = 0), 

— * — * 

but 62 • ai = 1. Making the same sort of translation in (5.15), but this time by 2/2^2, we 
eliminate the second factor by dividing through, so that the first factor becomes 

z'^\t;yi,y2) = ^^{t;yi). 

Proceeding in this way, we eventually learn that for some k < M — N, 



'^Vk+i ^1 



(k) 



-{t;yi,---,yk) 



is independent of t. 

By Lemma 5.6, it follows that z[^\t; yi, . . . ,yk) = e*'^^*'^^'---'^'^^ where (j){t; yi, . . . ,yk) is 
a first degree polynomial in t with coefficients that are measurable in yi, . . . ,yk. But by 
definition, 

zi'"\t;yi,y2) = \-i) {t^Vi. ■ ■ ■ .Vk-i) ■ 
^1 
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Applying Lemma 5.6 again, we learn the form of . Proceeding in this way, we learn 

the form of zi. ■ 

Once one knows that the possible phase functions are polynomials of limited degree, it 
is a problem in linear algebra to determine them explicitly for any particular set of vectors 
{ai, . . . , cin}- 

6: The best best constant 

Let {ai, 02, . . . , ajv} be a properly redundant spanning set in IR^ , and let P denote 
the orthogonal projection onto the image of A\ Notice that Tr{P) = rank(A*) = M. 
Also, since P is an orthogonal projection, each diagonal entry Pjj satisfies < Pjj < 1. 
Furthermore, since no column of A is zero, we actually have < Pjj for each j. 

Indeed, since rank(^) = M, AA* is positive definite, and P = A^(AA^)~-^A. Therefore, 

Pj,j = ej ■ A\AA')-^Ae, = a, ■ {AA^a, > . 
Hence, if we define p° by 

4 = Pjj = Sj ■ (AA^r'aj > , 

we have that whenever {Si, 0,2, ■ ■ ■ , ajv} is properly redundant, 1 < < 00 for each j, and 

also = Tr{P) = M, so that (4.1) is satisfied. Morover, the Euler-Lagrange 

equation (3.12) is then satisfied with S = I. 

Definition Let {ai, 02, ... , ajv} be a properly redundant spanning set of vectors in IR^ , 
and let A = [ai, 02, . . . , ajy]. Let P be the orthogonal projection in IR^ onto the image of 
A*. For J = 1, 2, . . . , A^, define p° = 1/Pjj. Then p° = {pi°,P2°5 • • • ^Pn} the canonical 
set of LP indices corresponding to {ai, 02, ... , Sn}- The terminology will be justified by 
Theorem 6.1 below. 

Since for p — p°, the Euler-Lagrange equations (3.12) are satisfied with S — I, it 
follows from (3.19) that 

N 

D{P1 = n ^P^^^y'''''" det(AA*)-V2 _ ^g -L) 

i=i 

Notice that while computing Dip) for given indices is a nonlinear optimization 
problem, calculating the D{po) is a simple matter of linear algebra. This is significant since 
it turns out that given the vectors {ai, 02, ... , Sn}, D{p°) is the "best best constant" in 
the generalized Young's inequality 

f N N 
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This justifes the terminology "canonical U' indices" : 



Theorem 6.1: For any properly redundant spanning set, D{p°) < D{p) for all p ^ p° . 

Proof: Let (/)^ be the function defined by (3.9), and 4)\ its Legendre transform. It was 
shown by Brascamp and Lieb that (j)A is convex. Since (j)A is smooth as well as convex, 0^ 
is strictly convex. From (3.10) and (4.10), we have 

2Hb{p)) = <t>\( (6.2) 

\Pl P2 PnJ 

By the definition of and the Euler-Lagrange equation (3.12), if 1/Pj' is the jth. 
canonical index, 

11 1 



' o ' ■ ■ ■ ' 



V(/>a(0) . (6.3) 



.Pi P2 Pn. 

But since the gradients of Legendre transforms are inverse to one another. 

This proves that the vector on the left in (6.3) is a critical point of 0^. Since 0^ is strictly 
convex, it is the unique minimizer. ■ 



We also note that formula (6.2) displays D{p) as a log-convex function of p, This can 
be used to produce arbitrarily sharp upper bounds on D{j>) for a given set of indices: 
Using Newton's method or some other means of generating explicit approximate solutions 
of the Euler-Lagrange equations (3.12), generate several approximate solutions. For each, 
compute the "best best constant" for each {siai, 5202, • • • , sjvajv}. If p can be written 
as a convex combination of the corresponding vectors of cannionical inverse indices, 
then D{p) can be bounded above by a convex combination of the corresponding "best best 
constants" . 

Special cases of the canonical indices have arisen in applications of the Brascamp 
Lieb inequality. A beautiful application to convex geometry by Keith Ball [1] concerened 
a situation in which N unit vectors ui, . . . ,un satisify 



N 

CjUju] = Imxm ■ (6.4) 



where the Cj are positive numbers. Clearly, ^^^i Cj = M. Let aj = ^JCjUj. Then (6.4) 

becomes AA^ = ImxM- K follows that the orthogonal projection onto the image of A* is 
simply and the jth diagonal entry is Cj. Hence taking pj = 1/cj gives the canonical 

indices in this case. These were the indices used by Ball in his application. 
Since for the canonical indices, the Euler-Lagrange equation (3.12) is then satisfied 
with S = I, the heat flow interpolation argument of Section 3 gives an especially simple 



l/february/2008; 18:41 



37 



proof of the inequality in this case. For this reason, the method of proof developed here 
works very simply in Keith Ball's context; see [4] for more information. 

Example: Consider the five vectors 



ai 





1" 




0" 








"1" 




"0" 




-1 


a2 = 


1 


03 = 





0,4 — 





as = 


1 









-1 




1 













It is easily seen that this is a properly redundant spanning set. Notice that the first three 
vectors all lie in the plane xi + X2 + X3 = 0. As long as < l/pj < 1 for each j, 



(6.5) 



Therefore, as long as r{S) > min{|5'|,M} and (4.1) is satisfied, and there are no super- 
critical sets. The only set S with r{S) < min{|S'|,M} is 5" = {1,2,3}. Therefore, as long 

as — + — + — < 2, together with the scaling condition (4.1) and < p,- < 1 for each j 
Pi Pi Pi 

and are all satisfied, p belongs to K°^, and Ka is the closure of the points obtained in this 
way. An easy computation shows that the canonical indices for this example are pi = 2, 
and p2 = P3 = Pa = Pb = 8/5. By Theorem 4.4, is has 9 vertices, and is their convex 
hull. 



Appendix 

In this section we exhibit trial functions that show the optimality of Theorem 1 and 
Theorem 2, and describe the change of variables leading to (5.1). 

First we show that the inequality in Theorem 1 cannot hold with any constant if the 
index p of the norms on the right side is less than 2. For any given < a < 1, define 
f{v) be defined by 

f{v) = l^;!"" + (1 - ^;2)-a(iV-l)/2 _ 

Then / f^du^ < 00 as long as pet < 1, as one easily sees from (1.10). 
J[-i,i] 

On the other hand, discarding one term in each factor, 

N /n-1 \ 

We can parameterize the upper and lower hemispheres of S^'^ using the coordinates 
(vi, . . . , vjv-i) The intergal over S^~^ is then easily converted into an integral over the 
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unit ball in ^. Doing this in radial coordinates, we have, since |vn| = VT— in these 
coordinates. 



where C is a positive constant resulting from the angular integration. This integral diverges 
unless a < 1/2. 

The conclusion is that for all N and all p < 2, there is a positive function / so that 



/ rivi)dfx < oo while / ( n f{vj) I d/x 



oo. 



Next we turn to the entropy inequality in Theorem 2. Consider a spherical cap on 
the sphere 5"^"^ centered at the point vi = 1,V2 = 0, . . . ,vn = of radius e denote its 
characteristic function by Xe- Define 



Hxs with H = (^J^^_^ Xsdi?j 



Clearly, H is of order s ^\ and hence for s small, S (F) is of order 

- log(iy) (7.2) 

which is of order {N — 1) log(£). Since the function is invariant under all rotation that fix 
the vi axis, we get that the entropy of the marginal is also given by (7.2). Moreover, the 
j-th marginal can be thought of as averaging the function Hxs over all roations that keep 
the axis vj fixed, the resulting function is essentially a multiple of a characteristic function 
of a band of width 2e that is centered at the equator perpendicular to the Vj axis. Call 
this function fj := ip^. Since the integral of this function must be equals to one the height 
of this function must he h = (/^jv-i V'ed/u)"^, and is of order 1/e. Hence its entropy is of 
order log(e). Thus the sum of the entropies of the marginals is given, in leading order, by 
2(A^ — 1) log(£) which is twice the entropy of the function F. This shows that the constant 
2 in the entropy inequality is sharp. 

Finally, the coordinate change leading to (5.1) may be described as follows: Suppose 
that ai is not in the span of {a2, . . . , Sn}- Let {ui, U2, ■ ■ ■ ■, um} be an orthonormal basis of 
IR^ so that {^2, . . . , Um} has the same span as {02, . . . , ajv}. Let R be the matrix given 
by i? = [ai, W2, ■ • • 7 Um]- (That is, the first column of is ai, the second column is ■U2, and 
so forth). Then R is invertible, and we can define new coordinates 2; by 2; = R^x. With 
this definition, zi = ai- x. Moreover, for j > 2, 

Uj ■ X — dj ■ {R^)~^z — {R~^dj) ■ z . 

Since R~^aj is the coordinate vector of Sj with respect to the basis {ai,U2, ■ ■ ■ ,um}, 
(R~^dj)i = for j > 2. Therefore, defining w in M^"^ by wj = Zj+i, there are uniquely 
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determined vectors bj in ^ so that {R ^dj) ■ z = bj ■ w. Since 



d^x = —1—d^z = —l—dzid^'-^w 
\ui ■ ai\ \ui ■ ai\ 



we have (5.1). 
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